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Abstract 

Single particle diffraction imaging experiments at free-electron lasers (FEL) have a great potential 
for structure determination of reproducible biological specimens that can not be crystallized. One of 
the challenges in processing the data from such an experiment is to determine correct orientation 
of each diffraction pattern from samples randomly injected in the FEL beam. We propose an 
algorithm [1] that can solve this problem and can be applied to samples from tens of nanometers to 
microns in size, measured with sub-nanometer resolution in the presence of noise. This is achieved 
by the simultaneous analysis of a large number of diffraction patterns corresponding to different 
orientations of the particles. The algorithms efficiency is demonstrated for two biological samples, 
an artificial protein structure without any symmetry and a virus with icosahedral symmetry. Both 
structures are few tens of nanometers in size and consist of more than 100 000 non-hydrogen atoms. 
More than 10 000 diffraction patterns with Poisson noise were simulated and analyzed for each 
structure. Our simulations indicate the possibility to achieve resolution of about 3.3 A at 3 A 
wavelength and incoming flux of 10^^ photons per pulse focused to 100x100 nm^. 
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I. INTRODUCTION 



The problem of solving the structure of individual biological specimens to high resolution 
is critical for many branches of modern life- and bio-science. Two widely used techniques for 
high resolution structure determination are x-ray crystallography and electron microscopy. 
X-ray crystallography can only be used for molecules that form crystals [2J, whereas trans- 
mission electron microscopy is limited to structures with a thickness well below one micron 
[3]. Therefore the majority of samples must be sliced |4j and the minimum thickness of the 
slices limits the resolution of this method. 

Single particle coherent diffraction imaging [SHTj is one of the promising new techniques 
for the investigation of biological samples to subnanometer resolution. It has become possi- 
ble only recently due to the development of x-ray free-electron lasers [814TT] , which produce 
ultra-short (less than 100 fs), coherent x-ray pulses with high intensity (more than 10^^ 
photons in a single pulse). Short and intense pulses are required to overcome the radia- 
tion damage of biological particles during the pulse propagation [T2] - [T^ and to produce a 
high number of elastically scattered photons [5l [I5]. The coherence of the incident beam 
is important for a successful reconstruction of the electron density of the sample [T614T8] . 
However, after the pulse propagation the particles will explode, and only one projection of 
the sample can be measured. This problem can be overcome by injecting particles one after 
another with random orientations and collecting a set of diffraction patterns Each mea- 
sured diffraction pattern corresponds then to an unknown particle orientation. A method 
to determine the orientation of the particle, corresponding to each diffraction pattern, is the 
main subject of this paper. When the relative angular orientation of all diffraction patterns 
is determined the full three-dimensional (3D) intensity distribution in reciprocal space can 
be obtained. The structural information, or electron density of the sample is determined 
then by the phase retrieval [121 |2U] . 

During the last few years there was a significant progress in the practical implementation 
of these ideas at hard x-ray FELs (see, for example, pTH25] ). There were few attempts 
to determine three-dimensional (3D) structure in single particle imaging experiments [2^ . 
however, the methods are still under development. Several approaches have been proposed 
so far to find an unknown particle orientation in these experiments. One is based on the 
common arc algorithm [^ originally developed for electron microscopy [26| [27] . This al- 

2 



gorithm exploits the fact that all two-dimensional (2D) diffraction patterns of reproducible 
particles in random orientations represent sections by the Ewald sphere of the 3D intensity 
distribution in reciprocal space. As such, all diffraction patterns have one common point, 
the origin of reciprocal space, and intersect along common arcs. The intensities along these 
arcs must be equal, and using this information the relative orientation of all diffraction pat- 
terns can be determined. The main problem of this method is its demand for a high signal 
to noise ratio, which is difficult to satisfy even with the present high power FEL sources. 
It was suggested to overcome this limitation by an additional classification step |2SII2H], in 
which diffraction patterns with similar particle orientations are averaged prior to orientation 
determination. This step improves the statistics of each averaged diffraction pattern, but at 
the same time reduces its contrast. As a result, the classification step decreases the achiev- 
able resolution and can produce artifacts in the final stage of electron density reconstruction. 
Another method is based on generative topographic mapping and neural networks [291 EO] ■ 
This approach works well for a low signal to noise ratio but scales poorly with the number 
of resolution elements in terms of computational time and memory. The same is valid for a 
method based on an expectation maximization technique |31j . 

Here, we propose an orientation determination method based on an improved common 
arc algorithm [1]. Instead of a classification step we perform a simultaneous analysis of 
common arcs between many diffraction patterns. To improve the quality of the orientation 
determination, a 3D angular refinement procedure is applied at the final step. This algorithm 
works well even with a low photon signal down to 0.05 photons per pixel for sampling rate 
of three at the edge of the detector. It scales linearly with the number of resolution elements 
and number of measured diffraction patterns. Memory requirements are relaxed because 
most of the data can be processed in parts. Finally, the algorithm is highly parallelizable 
since most of the analysis is done between pairs of diffraction patterns. 

The paper is organized in the following way. In section two we describe our implemen- 
tation of the common arc algorithm. Section three describes our approach to treat poor 
signal to noise ratio data as well as orientation refinement procedure. Tests of the proposed 
algorithm on simulated data from two different biological structures are presented in sec- 
tion four. The paper is completed by the conclusion section. The details of the algorithm 
implementation are presented in the Appendix. 
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II. COMMON ARC ALGORITHM 



In a typical single particle diffraction imaging experiment, a sample with unknown orien- 
tation is injected into the focused coherent x-ray beam of an FEL (Fig. [l|a). The scattered 
radiation is measured in the far field by a 2D detector. This diffraction pattern can be 
mapped on an Ewald sphere ^2j, and represents a 2D cut of the 3D intensity distribution 
in reciprocal space (Fig. [l|b). Alternatively, the diffraction pattern can be considered as a 
perspective projection of an Ewald sphere sector onto the 2D detector plane as viewed from 
the sample position (Fig. |2]). 

Our previous studies suggest [7] that, in order to increase the scattered signal, it is 
favorable to use longer x-ray wavelengths, since the x-ray scattering cross-sections are larger 
at these wavelengths. At the same time the energy of the incident x-rays should be sufficient 
to penetrate the sample. To achieve high resolution the detector should also cover high 
scattering angles. Under these conditions a large sector of an Ewald sphere is covered, 
which is beneficial for orientation determination. 

When two independent measurements of identical particles with different orientations are 
considered, the orientation of the first particle can be fixed as known. The orientation of 
the second particle can be uniquely described relative to the first one. Alternatively, two 
measured diffraction patterns could be considered to originate from the same particle in 
different experimental geometries. In this case the particle orientation is fixed, but the di- 
rection of the incident beam and the detector orientation are different for each measurement 
as shown in Fig. |2j For the first measurement the incident beam direction, given by its 
wavevector K^i, can be taken along the axis in the reciprocal space coordinate system 
shown in Fig. [T|b. The direction of the incident beam for the second measurement is given 
by its wavevector Kj2 (Fig. |2]). The relative orientation of the second geometry with respect 
to the first one can be described by three Euler angles 0, 9, ip The choice of Euler angles 
is convenient, since rotations around the angles (p and in reciprocal space are equivalent 
to rotations by the same angles of detectors one and two in real space, respectively. 

For monochromatic x-rays the Ewald sphere has the radius K = 2tt/X, where A is the 
wavelength of the incident radiation. The Ewald spheres corresponding to the two mea- 
surements pass through the origin of the reciprocal space coordinate system (Fig. |2]). The 
origin of the first Ewald sphere (see point A in Fig. [2]), for the incident vector Kji is at 
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(0,0, —K) and the origin of the second sphere (point B in Fig. [2]), for the incident vector 
Kj2, is at (g^^O) Qzo) ■ The coordinates q^o, qyo, Qzo are determined by a rotation of the 
point (0, 0, —K) around the reciprocal space origin (0, 0, 0) by the Euler angles and 9. 
The intersection of the two spheres is a common arc that also passes through the origin of 
reciprocal space (see Fig. |3|a). This common arc is projected on the two detectors (curves 
a and b in Fig. [2]). It is clear from this construction that the intensity along these arcs must 
be the same at both detectors. By analyzing the intensity correlations along all possible 
common arcs, the unique relative orientation of the two measurements can be determined. 

It should be noted that a common arc can fix the relative orientation of two patterns only 
for experimental geometries with large scattering angle. Otherwise, the measured sector of 
the Ewald sphere can be considered as flat, and the common arc reduces to a straight line. 
This common line fixes only the angles and ip, but not the angle 6, therefore a simultaneous 
analysis of at least three diffraction patterns is needed in this case |27] . 

The projection of the common arc on the first detector (curve a in Fig. [2]) can be 
expressed in the detector 2D coordinate system {x,y) by the following equation (see for 
details Appendix A) 

(ql, - (g,o + Ky)x' + (q^, - (g,o + K)')y' + (1) 
^qyoqxoxy + 2dqxo{qzO + K)x + 2dqyo{q^o + K)y = , 

where d is the sample-detector distance and x, y are the coordinates of the common arc 
projection on the first detector. Similar projection of the common arc on the second detector 
(curve b in Fig. [2]) is also described by equation ([T]) by substituting x,y coordinates to x', 

y' = -y- 

As it follows from equation ([T]), the curvature of the common arcs a and b at the detectors 
one and two is determined only by the angle 6 and sample-detector distance d. Practically, 
the coordinates of the projections of common arcs at the detector planes are obtained by 
solving equation Q for each value of 9 and d with the fixed angles (J) = = 0. A set of 
curves corresponding to the fixed value of the angle 6 and all other values of angles and ijj 
is determined by rotation of the curve obtained on the previous step. This is implemented 
by rotation of the coordinate system (x, y) corresponding to the first detector by angle 
and the coordinate system {x',y') of the second detector by an angle ip (see Fig. |2] and 
Appendix B for details). 
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For each set of Euler angles and fixed sample-detector distance d the coordinates of both 
arcs (a and b in Fig. [2]) are determined, and the intensities along these lines are compared 
by calculating the cross- correlation coefficient (CCC) c"'^{(j),9,ip) 



where 7^(^,0) = ln[/f'''(^,0) + l] are logarithms of the intensities /f 0) and li{0, 0) along 
the first (a) and second (6) arc. The correct orientation of the second measurement with 
respect to the first one is given by the set of angles (f) 3,^3,4^3 that maximize the CCC in 
Eq. ([2]). To determine this orientation all three Euler angles (0, 9, ip) are varied sequentially 
with some angular step and CCCs for every orientation are calculated and compared. This 
procedure is applied to all diffraction patterns until their orientation relative to the first one 
is determined (Fig. |3| 

It should be noted here that not only orientation but also the position of a particle 
in space should be taken into account explicitly in the analysis. The transverse position 
relative to the incoming x-ray beam adds only a phase to the scattered amplitude and 
scales its intensity. If each diffraction pattern is properly centered this does not cause any 
problems in the analysis since the phase is not recorded by the detector. The intensity of each 
diffraction pattern can be rescaled at the stage of composing the 3D intensity distribution in 
reciprocal space, as described later. At the same time, the particle-detector distance d, must 
be taken into account explicitly, due to its strong influence on the diffraction pattern. If 
two measurements are performed at different sample-detector distances di and ^2, equation 
([1]) must be solved separately for both detectors taking into account the corresponding 
distances. This is especially important in real experimental conditions, when particles are 
injected in the beam, due to variations of the distance d from shot to shot. We also assume 
in our analysis that the particle size is much smaller than any variations of beam intensity. 
More details on our practical implementation of the common arc algorithm are presented in 
Appendix B. 

The common arc algorithm described in this section performs well for data sets with high 
signal to noise ratio [25]. However, it often fails in practical applications for a low number of 
scattered photons. One way to overcome this problem is presented in the following section. 



III. ADVANCED ALGORITHM FOR ORIENTATION DETERMINATION IN 
THE PRESENCE OF NOISE 



In the previous section, the common arcs between one diffraction pattern (that we define 
as a base pattern) and all other diffraction patterns were analyzed. At the same time we 
should note that each diffraction pattern has a common arc with other diffraction patterns 
(see Fig. [s]). Therefore, common arcs between all patterns could, in principle, be ana- 
lyzed simultaneously. Such analysis can significantly improve the fidelity of the orientation, 
however, its practical implementation requires an increase in computational resources. A 
compromise can be found by implementing the following strategy. As a first step a set of 
base diffraction patterns A^^ase is analyzed with respect to each other to determine the cor- 
rect orientations of these chosen patterns. In the next step all other patterns are oriented 
with respect to each of these base patterns. This implementation requires Nbase times more 
calculations compared to a single base pattern. In the final step all intensities are mapped 
to a 3D array of voxels in reciprocal space by 3D gridding and averaging procedure. The 
benefit of this approach is the possibility to solve the orientation problem for noisy data, as 
will be demonstrated in the following section (see also for a detailed discussion Appendix 
C). 

In a real experimental situation all diffraction patterns have different intensities due to 
shot to shot intensity jitter of the FEL and the fact that each injected particle is hit by a 
different part of a focused beam. As a consequence all measured diffraction patterns have 
to be rescaled. This is implemented in the algorithm by utilizing the fact that each of two 
patterns have a common arc and that the intensities along this arc must be equal. The 
scaling factor for the intensities can be determined by taking the ratio of intensities of two 
diffraction patterns along the common arc. Having all information about the experimental 
geometry, orientation, and scaling factor for each pattern the 3D intensity distribution in 
reciprocal space can be constructed. 

The orientation determination can be significantly improved by an additional refinement 
that is based on the correlations between an individual pattern and the whole 3D intensity 
distribution. It can be implemented in the following way. First, the 3D intensity distribution 
is obtained from all but one selected diffraction pattern. Then the orientation of the selected 
pattern is varied in a small angular range and the correlation between this 2D pattern and the 
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whole 3D intensity distribution is analyzed. The orientation corresponding to the highest 
correlation value is considered to be the correct one. Then, the rescaled intensity of the 
selected pattern with the refined orientation is included in the new 3D intensity distribution 
in which the next diffraction pattern is excluded and the refining procedure is repeated. 
By applying this approach to all diffraction patterns the final 3D intensity distribution is 
obtained. This procedure can also be applied to identify diffraction patterns from "wrong" 
particles (the ones that do not belong to a set of samples under investigation). Clearly, 
correlation coefficients of diffraction patterns originating from these particles and the whole 
3D intensity distribution will be quite low, which can be used as a criteria for rejection of 
these diffraction patterns from the future analysis. 

If the structure has a known symmetry, this can be used as an additional constraint for 
orientation determination [271 El] ■ Using symmetry conditions each diffraction pattern can 
be oriented individually with respect to the selected symmetry axis. This is contrary to 
the structures without symmetry when at least two patterns are required for orientation 
determination. Applying symmetry conditions it is possible to get sufficient number of 
diffraction patterns for 3D representation of the scattered intensity in reciprocal space even 
with a limited data set or a large area of missing data due to a big beamstop. This approach 
was successfully used for simulated data for a sample with icosahedral symmetry discussed 
in the next section as well as for experimentally measured diffraction patterns of a Mimi 
virus obtained in coherent diffraction imaging experiment at FLASH [35] . 

It is interesting to note that presented implementation of the common arc algorithm also 
allows to determine the unknown symmetry of the object. This can be obtained by the 
analysis of angular orientations appearing with the highest probability. Such orientations 
can be found in a 3D angular map (0, 6*, tp) of all possible orientations and reveal themselves 
as regions with high density (see for details Appendix C). For example, for structures with 
icosahedral symmetry it will correspond to 120 most likely orientations in reciprocal space 
that are related to the icosahedral symmetry transformation matrix. 

A sampling rate of at least two in each direction in diffraction pattern is required for a 
successful implementation of the algorithm described here. The same requirement is valid 
for the phase retrieval algorithms applied for the reconstruction of electron density of the 
samples. A higher sampling rate is beneficial for orientation determination because each 
speckle consists of more pixels. At the same time data with a lower sampling rate have 
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a higher signal in each pixel, which could become important for orientation determination 
of data with a low signal to noise ratio By testing different sampling conditions we 

found that an optimal sampling rate is in the region from two to three. In practice, to 
increase signal the experimental data could be binned for orientation determination, while 
the reconstruction is performed on the original unbinned data set. Applying on a final 
step phase retrieval algorithms fi9\ [20] to the 3D data set of the intensity distribution, the 
electron density of the sample can be obtained. 

IV. NUMERICAL TEST OF THE ALGORITHM 

The algorithm was tested with two different biological structures. The first one was 
an artificial protein structure without any symmetry combined from the 2BTV and 8RUC 
macromolecular structures [37] (see Fig. |4|a). It has a size of 13 x 19 x 28 nm^ and consists 
of about 124 000 non-hydrogen atoms. The second one was a human adenovirus penton 
base 2 12 chimera 2c6s [37j. It has icosahedral symmetry with the diameter of 27 nm and 
consists of about 200 000 non-hydrogen atoms (Fig. |5|a). 

Diffraction patterns for both structures where calculated [38j at 3 A wavelength, with a 
detector size of 100 mm and a sample-detector distance of 50 mm, providing the maximum 
scattering angle of 45°. The achievable resolution in this geometry was 3.92 A at the detector 
edge and 3.3 A at its corner. The number of detector pixels was 512 x 512 for the first sample 
(providing minimum sampling rate of 2.5) and 360 x 360 for the second (with a sampling 
rate of two). The incoming flux was 10^^ photons focused uniformly on a 100 x 100 nm^, 
and Poison noise was added to each diffraction pattern. The average flux at the edge of 
the detector was 0.05 and 0.15 photons per pixel corresponding to 0.45 and 0.6 photons per 
Shannon angle for the first and second structure, respectively. For the first structure 36 x 36 x 
18 =23 328 patterns were simulated with a 10° increment for each Euler angle. For the second 
structure 12 000 randomly oriented patterns were simulated. A beamstop with diameter of 
about 2 mm covering 1.5 speckles was introduced in all simulated diffraction patterns, and 
this region was excluded from the calculation of correlation coefficients. Typical diffraction 
patterns for a single FEL pulse simulated in the experimental conditions described above 
are shown in Fig. |4|b and Fig. [5}b for the first and second structure, respectively. 

The correct orientation of each diffraction pattern was determined using the algorithm 
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described in the previous sections. This allowed us to obtain the full 3D intensity distribution 
in reciprocal space for each sample. A central slice through this distribution constructed from 
the oriented diffraction patterns corresponding to the first structure is presented in Figure 
|4|d. For comparison, the same slice through the 3D intensity distribution obtained from 
the known orientation of each diffraction pattern is shown in Fig. |4|c. It is well seen that 
the slice obtained as a result of orientation determination well reproduces all features of an 
"ideal" intensity distribution, small deviations can be attributed to angular misalignment. 
This misalignment between the angles obtained from the common arc algorithm and the 
correct angles for the first structure is presented as a plot in Fig. [6] The accuracy of the 
orientation determination correlates strongly with the angular step size for the Euler angles 
{(f), 6, ip). Clearly, a finer angular step size requires more computational time that scales 
as a third power of the step size. It is well seen in Fig. [6] that a three degree angular step 
being five times slower still gives higher accuracy in orientation determination comparing 
to a five degree step. An additional improvement in the angular determination is obtained 
by the final orientational refinement of each diffraction pattern with respect to 3D intensity 
distribution, as described in the previous section (see Fig. [6]). 

It is interesting to observe how the signal is increased by the number of diffraction patterns 
used in the analysis. In Fig. |5}d a central slice through the 3D array representing the 
number of patterns contributing to each voxel of the constructed 3D intensity distribution 
is presented. It can be seen from this figure that at least 100 patterns from the analyzed 
12 000 contribute to each voxel inside a resolution ring of 4 A. One more intriguing feature 
can be observed in this figure. Though the initial diffraction patterns were simulated up to 
3.92 A resolution at the edge of the detector the 3D intensity distribution obtained from the 
algorithm has distinguishable features up to 3.3 A resolution (see dark outer ring in Fig. 
Isld). This additional signal comes from the corners of the diffraction patterns. 



V. SUMMARY 

In summary, we proposed a method for the angular orientation determination in single 
particle coherent imaging experiments based on the common arc algorithm. We obtained 
a significant improvement of this approach by introducing a simultaneous analysis of the 
common arcs for several diffraction patterns. This gives the possibility to apply the method 
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to data with a low level of signal to noise ratio as well as to skip classification step which 
can reduce achievable resolution. Additionally, we proposed an orientational refinement of 
diffraction patterns that can improve the quality of the final 3D intensity distribution in 
reciprocal space. 

The algorithm proposed here has several advantages compared to other approaches [29l 
[31] . It scales linearly with the number of measured patterns and total number of pixels in 
the diffraction patterns. The algorithm is easy to parallelize, because most of the cross- 
correlation analysis is performed between pairs of independent diffraction patterns. It has 
minimum memory requirements, because the data can be processed in parts. 

We foresee that this approach has the potential to be the key for the success in the analysis 
of single particle diffraction imaging experiments and will allow to reach sub-nanometer 
resolution in three-dimensional imaging of biological specimens. 
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Appendix A: Derivation of the main equations 

Equation ([!]) was derived using the following considerations. Both intersecting Ewald 
spheres (Fig. [7| have radii equal to the wave vector K = 27r/A (|Kip = |Kfp = A'^), 
where A is wavelength of the incident radiation. Therefore the coordinates [q^, Qy, Qz) of the 
intersection curve must satisfy the equations 



ql + q'y + {q. + Kr = K\ 



(Al) 



(A2) 
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The center of the second Ewald sphere (g^o, Qyo, Qzo) hes on the distance K from the center 
of reciprocal space (|Kj2p = K^), therefore 



2,2,2 



(A3) 



From equations (Al - A3) the formula describing intersection of two Ewald spheres can 
be derived: 

(?Jo + + "^Qyogzoigz - K)qy + (g^^ + g2j^2 ^ K^{^ql^ _ g2j _ 2KqlQq-, = 0. (A4) 

As soon as the diffracted vector K/i (Fig. [?]) has the same direction in both real and 
reciprocal spaces (angles coincide) the following relation between coordinates of a pixel on 
the detector {x, y, z) in real space and corresponding coordinates of the end of Kj {q^, qy, qz) 
in reciprocal space can be written: 



(A5) 



X y z 

Qx Qy Qz ' 

As soon as the distance from the sample to the detector (d) is fixed, z = d. Equation ([T]) 



can be easily derived from equations (A4, A5). 



Appendix B: Common arc algorithm 

Due to the properties of Euler angles, angle (0 < < 27r) can be attributed to the 
rotation of the reciprocal space coordinate system around the incident beam (Kji), angle 6 
{0 < 6 < 7i) corresponds to the rotation around the new position of vector q^, and angle i/) 
{0 < < 2tt) is the final rotation of the coordinate system around the new position of the 
vector - vector Kj2 in Fig. |2} 

In practice, the curvature of the common arcs a and b in Fig. [2] is determined only by 
the angle 6 and distance d. The coordinates of the projections of common arcs on detector 
planes can be obtained for each value of 6 and d, with the fixed angles (p = ip = 0, i>Y solving 
equation Q. Other curves (for all values of angles (p and tp) at the fixed value of angle 9 
are determined by rotation of the curve obtained at the previous step. Coordinate system 
(x, y), corresponding to the first detector, is rotated by an angle (p and the coordinate system 
{x',y'), corresponding to the second detector, by an angle tp (Fig. [2]). 

The common arc algorithm described in this paper was implemented using the following 
scheme (supplementary Fig. |8]). Calculation starts with fixed angles = and ip = 0. Then 
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coordinates qxo, %o, Izo are calculated by Euler rotation on 9 angle of the vector Kji (point 
(0,0, —K)). After this equation ([T| is solved and coordinates (x,y) for the common arc are 
found. This curve is rotated on angle for the first pattern (each pair of (x, y) is multiplied 
by corresponding rotation matrix) and on the angle ip for the second pattern (with exchange 
y — )■ —y). After full determination of the curves for both patterns corresponding values of 
intensities (along the curve) are extracted using the interpolation described below. The 
intensities along the curves are then correlated. This process continues for all angles in 
the region < ip < 2tt and for all angles in the region < < 27r. Then the whole process 
is repeated for different 9 value. 

The common arcs approach has difficulties when angle 9 is large. In this case the in- 
tersection between two spheres reduces to a closed circle. When angle 9 approaches vr this 
circle shrinks to a point at the origin of reciprocal space coordinates (0,0,0). Therefore at 
large 9 (close to vr) the projection of the common arc to the detector plane, described by the 
equation ([T]), degenerates to an ellipse. Therefore the length along an arc and a circle can 
be different. Moreover the correlation coefficients found for the arcs with different curvature 
can hardly be compared, because the ends of such curves correspond to different g-range 
and so some curves will have good signal at the ends and some - mostly noise. From this 
considerations it is clear that it is difficult to compare curves obtained for small and big 
9 angles. To solve this problem we limited the range of acceptable 9 angles to the range 
< 9 < 71/2. To cover the range of angles 7^/2 < 9 < n we used the fact that reciprocal 
space is centro-symmetric for scattering on non-absorbing objects (Friedel's law). To take 
this into account for angles n/2 < 9 < vrwe invert direction of the vector Kj2 (Fig. [2]) 
to its opposite — Kj2, that corresponds to the following transformation: rotation of by vr 
(0 — 7- + 7r), rotation of by — t- tt — and final rotation of ■0 by vr (-0 — > ^/^ + tt). To finish 
inversion transformation we change y — )• —y (equation [T]) in detector plane. 

To find the intensities corresponding to each point of the curve described by equation 
([T|, some sort of gridding must be performed. In our calculations we used interpolation in 
the form of an average of four nearest neighbor pixels. We checked also other schemes of 
interpolation: nearest neighbor and bilinear interpolation [SH]. The first one is faster but 
lacks accuracy, the second is computationally much slower without noticeable increase in 
quality. 

All common arcs for different 9 values (different curvature) should cover the same q- 
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distance in reciprocal space. Therefore the step between points on the curve should remain 
constant. For this reason equation Q was differentiated analytically and starting from the 
center of detector {x = y = 0) each next point on the curve is calculated keeping the distance 
{dx"^ + dy"^) = const. 

The accuracy of calculations of cross-correlation coefficient (equation |2]) for all patterns 
can be increased by replacing intensities I{q) with their logarithms, more precisely by ln(l + 
/(q)). Also for noisy data the following form of CCC is more beneficial: 

^at^. Q -^ ^ E (^"(qi)- < ^°(|qil) >)(^^(qi)- < ^^(Iqil) >) r^^. 
VE in^i)- < >) VE (^^qi)- < J'ihil) >)^' 

where J(q) = ln(l + /(q)) and < J(|qi|) > is a radial averaged value of intensity corre- 
sponding to a ring with the radius |qi| on a diffraction pattern. 



Appendix C: Advanced algorithm for processing noisy data 

Fig. [9] shows typical correlations, calculated with equation ([2]), between two noisy diffrac- 
tion patterns for different angles with fixed angles 6 and ip (bottom curve). For comparison, 
correlation coefficients corresponding to a perfect data set are plotted in the same Fig. [9] 
(top curve). The best correlation coefficient between two noisy patterns can correspond to 
completely wrong orientation (like point B in Fig. [9]). Therefore several orientations {Nangi) 
corresponding to the best set of correlations must be stored. To avoid storage of almost 
identical angles, some tolerance Atoi in the best orientation angles determination is neces- 
sary. Only one angle in the range ±^4^/ with the highest correlation coefficient is stored. 
This leads to storage of only one angle per tolerance region (rectangles in Fig. [9]). Therefore 
only one point per marked rectangle in Fig. |9] is stored in the list of "best" angles (for 
example in Fig. |9]A^angZ = 10 and Atoi = 5°). 

The algorithm of orientation determination is following. In the first step all base patterns 
are correlated to each other using the algorithm described in the section [Tij As a result, 
Nangi "bcst" auglcs bctwcen each pattern and all other N^ase — 1 base patterns are stored. 
Therefore each pattern has {N^ase — ^)Nangi "best" angles with respect to all other base 
patterns. Then all these angles are recalculated with respect to one selected pattern (which 
attributed all angles equal to 0) - we shall call it the zeroth pattern. This is done in the 
following way: each pattern's base angles with respect to other bases are recalculated to 
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angles with respect to zeroth pattern taking into account that each base has Nangi angles 
with respect to zeroth. In this way {Nbase — '^)Nlngi + ^angi angles for each pattern with 
respect to zeroth are determined. 

Let's explain this step on an example. Consider one of the base patterns Pj. This pattern 
has Nangi angles with respect to the zeroth pattern Pq. The pattern Pi has also Nangi angles 
with respect to the first pattern Pi. But the first pattern Pi itself has Nangi angles with 
respect to the Pq. So, pattern Pj has already Nangi + N^ngi angles to the Pq. Then it also has 
N^ngi angles to the Pq through the second pattern P2. This process is continued for all base 
patterns. Finally all {Nhase ~'^)^lngi + ^angi angles determined for one base pattern (Pj) are 
plotted in 3D space of angles (0, 6, ip) and the angular region with the maximum density of 
points is selected as the best angle. In practice it is done in the following way: a number of 
points (in the tolerance region ±Atoi for each Euler's angle) near each point is calculated. 
The point with the biggest number of neighbors is considered to be the best estimate. Then 
the correct angle is determined by averaging of all positions of the neighbors. In Fig. 10 an 



example of such operation in 2D case (for angles 6 and with fixed tp) is presented. The 



best angle corresponds to the middle of the rectangle in Fig. 10 

As soon as all correct angles for the base patterns are determined, all other patterns can 
be oriented with respect to known bases. At this step only Nf,aseNangi angles need to be 
considered for each pattern under analysis in the algorithm described above. 

For better orientation determination the base patterns should be selected carefully among 
all measured. All base patterns should have different orientations, more precisely different 
6 angles. Because initially all angles are unknown this requirement can be satisfied by the 
analysis of 2D correlations between different patterns. This is performed by calculating 2D 
correlation between pairs of patterns for different angles (p and the best correlation coefficient 
is stored. Then patterns with the worst 2D cross-correlations between each other are selected 
as bases. Also for experimental data analysis, base patterns should be selected according to 
the recorded signal quality. 

The transformation to a Cartesian coordinate system in reciprocal space is performed in 
the following way. The whole reciprocal space is divided into elementary set of 3D voxels 
with the size corresponding to the pixel size of the detector. Each voxel could contain few 
values of the measured intensities that effectively increase signal in the 3D intensity intensity 



distribution (see Fig. |5|d). All intensities in each voxels are averaged and the full 3D dataset 
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is obtained. 

Orientation determination problem for the symmetrical structure (2c6s) without intro- 
ducing the symmetry is more difficult. This is due to the fact that instead of one dense spot 



in Fig. 10 there will be 60 (for structure with icosahedral symmetry) less dense spots in 3D 
angular space. So it is harder for algorithm to select the right orientation. The accuracy (or 
time) of orientations determination can be greatly improved if the symmetry of the sample 
is known. But we want to underline the important feature of the algorithm, that it can be 
used even for symmetrical data with unknown symmetry and also for the structures with 
pseudo-symmetry. 

There is one more important issue for speed and accuracy optimization while processing 
low flux data with noise. For the initial orientations determination there is no need to process 
high-Q region of diffraction pattern where the radially averaged photon counts is less then 
approximately 1-2 photons per Shannon pixel (a pixel with sampling rate equal to one). 
The region with smaller photon counts just lowers the accuracy of orientation determination 
based on common arcs. Of course this argument cannot be applied to objects with highly 
anisotropic scattering in different directions, like, for example, pyramids |10]. At the same 
time the final 3D reciprocal space and 3D angular refinement can be performed for the full 
datasets thus the whole procedure does not lower the resolution. 

The described in this chapter parameters for the two analyzed structures were: N^^ase = 
64 base patterns, angular step for each Euler's angle was 3°, Atoi = 5° and Nangi = 30. 
The initial orientation determination of the simulated diffraction patterns for both model 
structures was performed for the circular low-Q region, 150 pixels in diameter, whereas the 
3D refinement for 512 (8RUC) and 360 (2BTV) pixels with angular step of 0.5° in the range 
of 5° near the previously found position. The initial orientation determination of 23 328 
patterns with a 3° angular step and Nhase = 64 base patterns took about one week on a 
single 8-core computer. The refinement of the found orientation took about one day. 
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FIG. 1. (Color online) Schematic view of the experimental geometry, (a) In real space, diffraction 
pattern from a sample in random orientation is measured by a single FEL pulse, (b) In reciprocal 
space the measured diffraction pattern correspond to a cut of the 3D intensity distribution by an 
Ewald sphere sector. Vectors Kj and denote the incident and diffracted wavevectors. 




FIG. 2. (Color online) Measurements of two reproducible samples at random orientation can be 
considered as two measurements of the same sample with two different incident beam directions 
indicated by vectors Kji and Ki2. Angles ^,"0 are Euler's rotation angles. Points A and B are 
the centers of the corresponding Ewald's spheres. Coordinates on the first and second detector are 
indicated as x,y and x',y', respectively. 
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FIG. 3. (Color online) Few Ewald sphere sectors intersecting the 3D intensity distribution of the 
sample in reciprocal space. Yellow lines indicate common arcs between different patterns. 




FIG. 4. (Color online) Artificial protein structure without any symmetry combined from the 
2BTV and 8RUC macromolecular structures, (a) Iso-surface of the electron density, (b) single 
diffraction pattern (edge resolution 3.92A), (c,d) 2D central cuts (edge resolution 3.3A) through the 
constructed 3D intensity distribution in reciprocal space for the patterns with a known orientation 
(c), and the patterns with the orientations determined using the proposed algorithm (d). All 
diffraction patterns are presented in logarithmic scale. 
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FIG. 5. (Color online) Human adenovirus penton base 2 12 chimera 2c6s structure with the 
icosahedral symmetry. (a),(b),(c) The same as in Fig. |4| (d) number of diffraction patterns 



contributing to each voxel of (c) (see section III for details). All diffraction patterns are presented 
in logarithmic scale. 




Angular distribution, degrees 

FIG. 6. (Color online) Distribution of the angular error of the determined orientations for the 
structure without symmetry. Blue line corresponds to 5° angular step, green line 3°, and red line 
3° after the 3D refinement (see text for details). 



21 




FIG. 7. (Color online) Schematic view of the intersection of two Ewald spheres. 
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FIG. 8. Flowchart for efficient data analysis with common arc algorithm. 
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FIG. 9. (Color online) Correlation coefficient between two patterns for different (j) angles. Upper 
curve for ideal data, the lower one for the noisy data. 
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FIG. 10. (Color online) Two-dimensional {(p^O) distribution of angles corresponding to good corre- 
lation between a pattern and all base patterns. A blue box correspond to the region with highest 
density of good orientations. 
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